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The preponderance of literature indicates that the arrange- 
ment of items on a test according to item difficulty has no 
significant effect on test performance (Brt-nner. 1964> Flaugher, 
Melton and Myers, 1968» French and Greer, 1964> Marso, 1970> 
Monk and Stalling*, I970 f Muna and Smouse, 1968 Smouse and Muna, 
1%8, s*< and Cromack, 1066), That is, an S would obtain the 
same score on * test with any one of the following item arrange- 
ments! ^random order of item difficulty <r>, 2) ascending 
<t ?} Ji«i«ulty, easy items followed by more difficult items 
(E-H) , 3) descending level of diffioulty, difficult items fol- 
lowed by easier items IH-E). Three studies (Hambleton, 19$8j 
Lund, 1953 i MacNicol, 1956) with experimental evidence, and 
several authors of measurement texts (Davis, 1951; Gronlund, 1965 
Stanley and Ross, 1954) without referenced experimental evidence, 
recommend that test items be arranged in the easy to hard format. 
Although the preceding text book authors were partly refer ing to 
speeded tests, and other tests where item difficulty is used to 
discourage the Ss from continuing (e.g., Scholastic Aptitude 
Test, Graduate Record Examinations, etc.), their advice has been 
applied to all test forms. Finally, Heim (1955) found that Ss 
scored significantly higher on tests where items were arranged 
H-E, compared with E-H arrangement. 

The purpose Of the present study is to consider the problem 
of item arrangement in light of Kelson's (1930) adaptation level 
theory. Studies involving adaptation level* are generally con- 
cerned with supplying evidence which may help to answer the 
juestiom "why do things appear as, they do?" Kelson (1964) in- 
dicates that, "Judgements are relative to prevailing norms or 
adaptation levels. Thus a 4-ounee fountain pen is h^avy, but a 
baseball bat to be heavy must weigh over 4P ounces (p. 26)." 
Current trends and issues in Adaptation' level theory cover a 
tlt&A * m * fr6m ^ h ^?»idl i6 sooial psychology (Kelson, 



<A !I M * ft » li * d adaptation level theory in a study 
SL J2 ai " n * an * nt would predict that Ss taking a test on which 

E*H ar raicL^t « #^ ? jubjeotively seem even easier. With an 
L \t I ang f ent of t«at items the S would adapt to the easy items 
Sa?If S ! b8eque r t items would appear to be more difficult 

iSS ItltST itama ln 016 H " E contaxt ' " Perception of an 
bXter«£L h r T an8Wrfl -?* itaB ' khe r888a " b8 > would Sso 
«ti£ £ Ilif 1 ^u^ 6 000r8s of 88 ' caJdn * th * test compared 

with ss taking these same Items arranged E-rt-H. *<w»a 

None of the studies concerned with the item-order effects 
considered the S's perception of the items he was attempting to 
a T 6r '* If.l? P° S8ible ^at, although Items were arranged in 

d^f.^" 10111 ^ 88 did not perceiV8 th * ita«! afhaling 
different degrees of difficulty, That is, the ss may not have 

perceived any difference between difficult items /e.g., mediaS 
difficulty, pk about .20), medium items (e.g., pm about "so" and 
easy items (e.g., pm abcut .80). If this were the case, the 
h^ C ^ured e0tS predictad by adaption level theory would not 

JS?- V 08U } t8, ^"ivariate techniques could be applied to 
*nl$M °f daca by considering sets of easy items, medium 

in I TitLtt^L 1 i 8ffl8 f 8 8ub * 88t8 dependent variables) 

in a multivariate analysis of variance. 

questioJj| 8tUdy V8S de8i9ned to provide answers to the following 

1.. Given a test on which items have been set in an easy, 
medium, and hard (e-m-h) arrangement, and a test having tho 
same items arranged h-M-E, will Ss perceive fre items on 
tut*Z J*!* deferentially? S^oificaliy, will Ss who have 
the K-M-E arrangement perceive the H, M, and E sets of 
sterns as being significantly easier than Ss who have these 
same sets of iteme in the E-M-H context? ^£^-^^ ''■'- 
score on the items? * 



« 



METHOD 



The students from three sections of Teaching Reading and Lan- 
guage Arts in the Elementary School, Education 310, at Ohio 
University served as Ss for the study. The sections, taught by 
the tonior author, consisted of 25,17, and 43 junior level students 
for a total of 85Ss. The Ss had prepared for a midterm examination 
covering the basic components of reading instruction in the elemen- 
tary grades (i.e., comprehension, word attack skills, material 
selection, individualization, diagnosis). 

The items for the midterm examination were selected from a 
pool of 140 items given to 285 ss during the preceding 1972-73 
Fall quarter. True-false and multiple choice items were selected 
on the basis of their item difficulty and descriminatiori^indices. 
Previous investigators have found item difficulty (Brenner, 1964 > 
Carter, 1942r Davis, 1951* Gibbons, 1940) and item descrimination 
(Brennor, 1964) values to be highly reliable. Table h presents 
the original item pool and the midterm examination medians of the 
item indices for the true-false and multiple choice items on the 
H, M, and E subtests. ' 

— ... .mmmmm ........... 

Table 1 about here «^"** 



Two forms of the examination 'were prepared* . The items on 
the H-M-E form were arranged -'as follows 8 10 hard multiple choice 
items (H-MC) , 10 hard true-false items (U-TF) , 10 medium M-C 
items (M-MC), lOjneu&m T-F items (M-TF), 10 easy M-C items (E- 
MC) , and 10- eaSy T-F items (E-TF) . The items on the E-M-H form 
were., arranged in reverse order of item difficulty, but in the 
-same order of item type (i.e., multiple choice followed by true- 
false) . . 

During the class session prior to the midterm examination 
tho instructor (tjie junior author) told the Ss that following 
each item on the exam would be a Mkert scale on which they were 
to rate each of the items. The scale consisted of the choices 
(1) very easy, (2) easy, (3) average, (4) difficult, and (5) 
very difficult. The Ss were told that if they conscientiously 
rated each item* the results would be helpful in retaining, 
deleting ot adjusting each item for future examinations. 

On the day of the midterm the instructor again reminded the 
Ss of the tikert seals and told them that the ratings would be 
of most value if they answered the questions in the order 
presented, fo^ exftif fcne ^ 'ihstruotorV alfid a proctor, did not 
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observe anyone who was not complying with the directions . 

Tho examinations were. randomly distributed in each classroom, 
resulting in 42Ss taking the E-M-H arrangement and 43 Ss taking 
the H-M-E arrangement. There were 60 items and 60 ratings to be 
made, therefore/ each S was askod to make 120 responses. The 
tests weje tflectronically scored. 

^ A multivariate analysis of variance was used to analyze the 
data. If this type of analysis yields significant results, uni- 
variate t-tests fan bo run on the subtest means (Hununel and 
Sligo, 1971). Therefore , the following procedures were used to 
gain a rougtr a priori estimate of the power of the statistical 
tests. A "medium" effect size (Cohen, 1969), a measure of the 
effect one desires to detect, of .50 was selected for this study. 
Cohen (1969, p. 28) indicated that given * .05, n « 42 and an 
effect size >f .50, that the power for a one-tailed independent 
t-test would be .74. That is, population mean differences of 
one-half standard deviation would bo detected three out of four 
times in this study. 

RESULTS AND DISCUSSION 



Hotelling's T (Morrison, 1967), the multivariate analogue 
of the univariate t-test, was used to analyze the data. The 
twelve dependent variables in the analyses consisted of the six 
parts H-MC, H-TF, M-MC, M-TF, E-MC, E-TF of the midterm, and 
rating scores arrived at by summing the ratings of the items in 
each part. 

In the analysis the overall multivariate test was signifi- 
cant (tabled P t.05> 12, 72) 1,92; calculated F « 7.58) and, 
therefore, the univariate t-tests on each dependent variable 
were considered t Table 2 presents tho means from each group, 
the pooled standard error of the moan difference s> and th$ 
univariate t-test for the six ratings. 



Table 2 about here 



The results in Table 2 indicate that the SS perceived the 
E, M, and H multiple choice items as being significantly Witt 
when attempted in a H^M-E context then wh^n these same items were 
attempted ih tfh B-M-ft context. When true^fals^ iume were 
corfeidered, only the difficult, H, items wete viMted As king 
sigffifid^fly easier when vidwed in ait H-M*fe*eontekfc Spared 

M i-ff-H contixt. ^Therefdre) the fim resea^h«Wsiion 
; may be answered in m affitiaative tot multiple ctidioe items 
and dlmMlt €r^e*!llse Uems. v 



The trend of the means in Table 2 suggests that with moro R 
and M type true-false items or with larger sample size, significant 
cufferences might be found between the mean perceptions of the 15 
and M type true-false items. That is # across all subtests the Ss 
l orceived the items in the H-M-E context as being easior than 
item? in the E-K-H context, but all of the differences were not 
significant. 



Table 3 about here 



Table 3 presents the means for each group, and the univariate 
t-tests for six subtests. The results presented in Tables 2 and 
3 indicate that although the students preceived most of the items 
as being easier in the H-M-E context, there were no significant 
differences in the test scores on five of the six subtests. 
This- result' is in agreement with the preponderance of literature 
concerned with the topic of item arrangement. Only in the case 
of the E-MC subtest were the resultant means in the same direction 
as the perceived means. Since this result was not consistent 
with the results of the other subtest means, its support must 
be held in abeyance until further replication of this study can 
be made. 

Further research in this area might be done on groups of Ss 
who have been differentiated on a pretest as having different 
levels of adaptation. Observation of the individual S data In 
this study indicates that some Ss may adapt "easily" to the item 
difficulties and some may not. For example, one S who took the 
E-M-H test had perceived scores on the E, M, H multiple choice 
subtests of 1.6, 3.0, 3.6 respectively j another S had scores on 
these same tests of 2.7, 2,7, 2.8, It might be conjectured that 
Ss who do adapt in the former manner, "easily", to item difficulties 
would score differently than Ss who do not. Munz and smousc (1968) 
did find that interactions existed between personality variables 
and item arrangements » 

This study should also be replicated across other populations 
of Ss and content areas* It may be that other Ss (e,g., elemen- 
tary school childr- k ) will adapt to item arrangement, and that 
their scores will be affected. 



FOOTNOTES 



1 "Adaptation lovcl" or "to."- "the hypothesized neutral point or 
region of organic functioning at which stimuli coinciding with 
AL are indifferent or ineffective, stimuli above Ab have a given 
character, and stimuli below to, have an opposite or complementary 
quality. AL represent :j the pooled affect of three classes of 
factors! (1) stimuli immediately responded to, or in focus of 
attention! (2) stimuli having b^-kground or contextual influence? 
and (3) residuals from past experience with similar stimuli" 
(English and , English, l«J5il, p. 11.) 



The index of discrimination \,«as calculated using the net D method 
{Marshall and Hales, 1971, p. 230). 



A better procedure would have buon to use eight forms of the 
examination so that, tho true-false and multiple choice item sets 
would have been counter-balanced. 



4 This procedure will yield only n rough estimate of power since 
the calculations should be based on the multivariate model. 
However, the authors know of no a priori means of selecting an 
effect size for this model . 
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